-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid resetting the VF if the netns does not exist #313
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Andrea Panattoni <[email protected]>
Pull Request Test Coverage Report for Build 12372904113Details
💛 - Coveralls |
cmd/sriov/main.go
Outdated
|
||
if !netConf.DPDKMode { | ||
netns, err := ns.GetNS(args.Netns) | ||
netns, err = ns.GetNS(args.Netns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always do this not only for non dpdk devices.
if not we should be able to reproduce the issue on a vfio-pci device for example (maybe not using the ipam)
but for example if the driver is slow and we need to reset the vlan on the vf and that part takes a lot of time we maybe be able to reproduce the issue no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. I updated the code so that if the net namespace doesn't exist, then Release and Reset are not called.
WDYT?
ad3c025
to
95fcfd3
Compare
The VF might have been assigned to another running Pod if the network namespace is not present. In cases like this, resetting the VF might break the configuration of the running Pod. The above scenario might occur when the IPAM plugin takes a lot of time to complete, and the CmdDel gets called multiple times. Check if the namespace is still present before resetting the VF Signed-off-by: Andrea Panattoni <[email protected]>
95fcfd3
to
dcdaec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! nice work!
@adrianchiris can you please give this one a look so we can merge it? |
"func", "cmdDel", | ||
"netConf.DeviceID", netConf.DeviceID, | ||
"args.Netns", args.Netns) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case dont we want to skip to L315 and mark pci address as released ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better not:
The scenario I'm trying to solve here is when a CmdDelete is called when the device has already been allocated to another Pod. If we call DeleteAllocatedPCI(...)
, we are freeing up a device on a running pod.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took another look at the codebase, can this really happen ? we use the pci allocator to check if the device is allocated in CmdAdd, see [1]
so even if NS was deleted and device was re-assigned to a new pod, its CNI call would fail on ADD until we released the device.
if my understanding of the above is true then why dont we want to reset Vf config in this case ?
[1]
sriov-cni/pkg/config/config.go
Line 60 in fca6591
isAllocated, err := allocator.IsAllocated(n.DeviceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right; in the scenario I just described, it probably can't happen.
What we saw happening is related to a very long execution of ipam.ExecDel(...)
.
This scenario includes 3 sriov-cni calls:
a. first_cniDEL starts and call ipam.ExecDel(...). this is going to take a very long time (~2 minutes)
b. second_cniDEL is invoked and, for some reason, it completes quite quickly (ipam deletion, vlan/mac reset, ...). PIC is deallocated, netns is deleted.
c. cniADD is invoked. It assigns the VF to a new pod, configuring VLAN, MAC, ...
d. first_cniDEL continues its execution, as the ipam.ExecDel
finished. Here, what I want to avoid is resetting the VF MAC, VLAN and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a better solution is to use lock file to handle on a single cmd ADD/DEL at a time for a given ns, network, interface ?
that way second cni DEL will fail and prevent freeing up the PCI in the allocator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, we save PCI device at the end of the cmd add call.
in this case we want to "lock" the call for the ns/network/interface at the beginning.
so if we want to use the same file we need to be careful (save at the beginning and release in case of failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right so maybe having a tmp file for ns/network/interface
is the right way going forward but we must be sure we clean that file.
and we cover a case of unexpected reboot so maybe putting this lock file in /tmp/ so on reboot it will get clean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of /tmp, we should go through all the files we have put on the fs when we start up the operator controller, and ensure our local db (files we put on the file system) is clean. There is no guarantee that /tmp will be cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can flock the pci file if it works. just need to make sure we release when we exit ADD and DEL cmds.
so , according to flock docs[1] it would release the lock if the process exits so in case of reboot we would not be stuck in a deadlock (it would be released when process exits)
nasty bug :) |
The VF might have been assigned to another running
Pod if the network namespace is not present. In cases
like this, resetting the VF might break the configuration
of the running Pod.
The above scenario might occur when the IPAM plugin takes a lot of time to complete, and the CmdDel gets called
multiple times.
Check if the namespace is still present before resetting the VF
@bn222 @SchSeba WDYT about this approach?